STATISTICS INFERENCE

Biostatistics Support and Research Unit

Germans Trias i Pujol Research Institute and Hospital (IGTP)
Badalona

March 19, 2025

Summary

1. Population and sample

2. Central Limit Theorem

3. Confidence Interval

4. Hypothesis test

5. Statistical vs clinical significance

Introduction

Population and sample


Population: a set of elements that have one or more characteristics that can be observed in common - these are known as inclusion criteria.

  • Example: The set of adults (>17 years) with hypertension.

  • Finite or infinite.

Population and sample


Sample: a subset of elements of a population.

  • It is finite and of a reasonable size.

  • Representative or convenience.

Inference


Statistical inference is the set of methods that allow conclusions to be drawn about a population from a sample.

This sample must be representative and made up of randomly selected individuals to avoid bias.

We will use characteristics of the sample to infer the characteristics of the population.

Parameter and Statistic


Parameter: A numerical value that describes a characteristic of a population.

Population mean: \[ \mu = \frac{\sum_{i=1}^{N} x_i}{N} \]

It’s constant and usually unknown.

Parameter and Statistic


Statistic: A numerical value calculated from a sample that describes a characteristic of the sample.

Sample mean: \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

It’s variable and can be calculated.

Point estimate


Sample mean: \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

Standard deviation: \[ S_x = \sqrt{\frac{\sum_{i=1}^{n} (x_i-\bar{x})^2}{n-1}} \]

Sample proportion: \[ p_i = \frac{x_i}{n} \]

From sample to sample


How to go from sample statistic to population parameter?



Note

The key concept is the sampling distribution: the probability distribution of a statistic.

From sample to sample


Weight measured on a random sample of 10 subjects:

- Sample #1
 [1]  94.3  80.3  69.7  77.0  75.7 102.2  79.7 101.5  99.7  59.1
Mean: 83.9
- Sample #2
 [1]  95.2 101.2  94.4  73.3  84.4  70.1  74.8  62.9  89.6  73.6
Mean: 82
- Sample #3
 [1]  72.7  70.8  82.9  70.2  73.9  76.2 100.7  77.1  81.3  88.0
Mean: 79.4
- Sample #4
 [1]  62.0 107.0  81.1  79.5  77.3  52.4  69.7  86.6  60.0  70.8
Mean: 74.6

Characteristic N = 41
Weight (Kg) 80.0 (4.0)
1 Mean (SD)

From sample to sample

What if we have 10 samples:

… 50 samples:

… 100 samples:

From sample to sample

… 500 samples:

… 1000 samples:

From sample to the population

Characteristic N = 501
Weight (Kg) 79.7 (4.3)
1 Mean (SD)
Characteristic N = 1001
Weight (Kg) 80.6 (5.0)
1 Mean (SD)
Characteristic N = 5001
Weight (Kg) 79.8 (4.8)
1 Mean (SD)
Characteristic N = 2,0001
Weight (Kg) 79.9 (4.8)
1 Mean (SD)

Central Limit Theorem


For large enough \(n\), the sampling distribution of the \(\bar{x}\) tends to a normal distribution with mean \(\mu\) and standard deviation \(\frac{\sigma_{\bar{x}}}{\sqrt{n}}\).


\[ \bar{x} \sim Normal(\mu,\frac{\sigma_{\bar{x}}}{\sqrt{n}}) \] where \(n\) is the number of subjects in each sample.

Central Limit Theorem

\[ \scriptsize x \sim Normal(\mu=80,\sigma=15) \]

\[ \scriptsize \bar{x} \sim Normal(\mu,\frac{\sigma}{\sqrt{n}}) \]

\[ \scriptsize \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}=\frac{15}{\sqrt{10}}=4.7 \]

\[ \scriptsize \bar{x} \sim Normal(\mu=80,\frac{\sigma}{\sqrt{n}}=4.7) \]

Central Limit Theorem… to the limit

10 samples of size 500
Characteristic N = 101
Rare 374.9 (4.9)
1 Mean (SD)
30 samples of size 500
Characteristic N = 301
Rare 374.9 (3.1)
1 Mean (SD)
50 samples of size 500
Characteristic N = 501
Rare 374.7 (3.9)
1 Mean (SD)

Central Limit Theorem… to the limit

What if we modify the sample size…

1000 samples of size 50
Characteristic N = 1,0001
Rare 374.9 (13.4)
1 Mean (SD)
1000 samples of size 500
Characteristic N = 1,0001
Rare 375.1 (4.0)
1 Mean (SD)
1000 samples of size 50K
Characteristic N = 1,0001
Rare 375.0 (0.4)
1 Mean (SD)

Standard Error


The standard error (SE) measures how much a sample statistic varies from sample to sample.

\[ \text{SE of the mean}= \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \] …in a sample

\[ \text{SE of the mean}= s_{\bar{x}} = \frac{s}{\sqrt{n}} \]

Standard Error

- Sample:

 [1]  88.0  50.7 111.4  65.4  83.3  81.0  44.0  82.8  71.8  99.9

- Mean:

[1] 77.84014

- Standard Deviaton:

[1] 20.70304

- Standard Error:

[1] 6.546876

Standard Error


\(\bar{x} - \text{percentil}_{2.5}\) N(μ=77.8, σ=6.5)= 65

\(\bar{x} + \text{percentil}_{97.5}\) N(μ=77.8, σ=6.5)= 90.7

Confidence interval


A range of values, derived from a sample, that is likely to contain the true population parameter with a specified confidence level (e.g., 95% or 99%).

\[ \text{Mean Confidence interval 95%} = \bar{x} \pm Z_{\alpha/2} \frac{s}{\sqrt{n}} \] Where \(Z \sim N(0,1)\)

Confidence interval: interpretation


  • The presented 95% confidence interval has a 95% chance of containing the true value.
  • Given 100 independent samples and their 95% confidence intervals of the mean, around 95 intervals would contain the true value.

  • 95% refers only to how often confidence intervals computed from many studies would contain the true value.

Confidence interval

Confidence interval


Given 100 independent samples and their 95% confidence intervals of the mean, around 5 would NOT contain the true value.

95% refers only to how often confidence intervals computed from many studies would contain the true value.

Confidence interval


To be honest… you should use the t-Student distribution.

\[ \text{Mean confidence interval} = \bar{x} \pm t \frac{s}{\sqrt{n}} \]

Where \(t \sim t_{1-\frac{\alpha}{2},(n-1) }\)

Take a deep breath…

Uganda


What % of Uganda’s roads are paved?


Uganda

CIA’s World Factbook

CIA knows: https://www.cia.gov/the-world-factbook/

CIA’s World Factbook


Total: 20,544 km (excludes local roads)
Paved: 4,257 km (20.7%)


Is our estimate reliable or biased?

Hypothesis test


A statistical method to assess whether the evidence in a sample supports or contradicts a claim about a population parameter.

Hypothesis test


The null hypothesis \(H_{0}\) is the assumption tested in statistical analysis, stating that a population parameter meets a specific condition, such as no effect or no difference.


\[ H_{0}: \pi_{\text{paved}} = 20.7% \]

Hypothesis test


The alternative hypothesis \(H_{1}\) is a statement that contradicts the null hypothesis, proposing that there is an effect, a difference, or an association in the population.


\[ H_{1}: \pi_{\text{paved}} \ne 20.7% \]

Hypothesis test


A Type I error occurs when the null hypothesis is rejected despite being true, with its probability defined by the significance level (\(\alpha\)).


\(\alpha=0.05\) ; \(\alpha=0.01\)


This risk should be considered before conducting the test.

Hypothesis test


A Type II error occurs when the null hypothesis is not rejected despite being false, with its probability denoted as \(\beta\); its complement \((1 - \beta)\) is called statistical power.


\(\text{power}=(1-\beta)=0.8\) ; \(\text{power}=(1-\beta)=0.9\)


This risk should be considered before conducting the test.

MAP Crunch vs CIA’s World Factbook


Null hypothesis: 20.7%
Our data says: 5 events over 20 possibles events.
So, the % of paved roads is 25% 95%CI [11.2% to 46.9%]


Is this data consistent with our hypothesis?

How it works?


\[ H_{0}: \pi_{\text{paved}} = 20.7% \] \[ H_{1}: \pi_{\text{paved}} \ne 20.7% \]


Is the difference \(\pi_{\text{paved}} - 20.7=0\) ?


Under the null hypothesis the difference is 0 with a SE \(\frac{\sigma_{p_{\text{paved}}}}{\sqrt{n}}\)


Thus, the difference distribution \(\sim N(\mu=0,\sigma=\frac{\sigma_{p_{\text{paved}}}}{\sqrt{n}})\)

How it works?

Proportion difference distribution under the null hypothesis.

How it works?

Proportion difference distribution under the null hypothesis.


How probable is to observe a difference of \(p_{\text{paved}} - 20.7 = 4.3\) under the null hypothesis?

How it works?

Proportion difference distribution under the null hypothesis.


Given the predefined Type I error rate, we fail to reject the null hypothesis because the probability of observing a difference of 4.3% or greater under the null hypothesis is 0.635 > 0.05.

What if?


Null hypothesis: 20.7%
Our data says: 10 events over 20 possibles events.
So % of paved roads is 50% with a 95%CI [29.9% to 70.1%]


Is this data consistent with our hypothesis?

What if?

Proportion difference distribution under the null hypothesis.


Given the predefined Type I error rate, we reject the null hypothesis because the probability of observing a difference of 29.3% or greater under the null hypothesis is 0.0012 \(\le\) 0.05.

What if? (more)


Null hypothesis: 20.7%
Our data says: 100 events over 400 possibles events.
So % of paved roads is 25% with a 95%CI [21% to 29.5%]


Is this data consistent with our hypothesis?

What if? (more)

Proportion difference distribution under the null hypothesis.


Given the predefined Type I error rate, we reject the null hypothesis because the probability of observing a difference of 4.3% or greater under the null hypothesis is 0.0338 \(\le\) 0.05.

P value


  • The P value is the probability that the test hypothesis is true.

  • The P value is the probability that chance alone produced the observed association.

  • The P value is the probability of obtaining a result as extreme as the observed one (or more extreme) under the assumption that the null hypothesis is true.

Statistical vs clinical significance


In clinical research, the use of hypothesis testing and p-values as a criterion of relevance has become widespread to the point of overuse.

Statistical significance, as defined above, indicates whether an observed result is unlikely under the null hypothesis, usually using cut-off points such as 0.05 or 0.01.

Statistical vs clinical significance


However, a statistically significant result does not indicate the size of the effect or its clinical relevance.

The clinical significance of a finding is determined by assessing whether the effect is large enough to influence medical practice or decision making.

The ASA’s statement


  • P-values do not measure the probability that the hypothesis being tested is true, or the probability that the data were generated by chance alone.

  • Scientific conclusions and business or policy decisions should not be based solely on the fact that the p-value exceeds a certain threshold.

  • A p-value, or statistical significance, does not measure the size of an effect or the importance of an outcome.




Ronald L. Wasserstein & Nicole A. Lazar (2016): The ASA’s statement on p-values: context, process, and purpose. The American Statistician Association.

Final messages


  • Inference is a SUPER POWER.

  • Report confidence intervals.

  • Handle p-values with care.

  • Mind the assumptions.

  • Report the effect size.


📅 Thanks & See You Next Week! 👋